representation and feature attribution mapping
Review for NeurIPS paper: ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping
This paper proposes a model for simultaneous classification and feature attribution in the context of medical image classification. The model uses GAN to learn two representations from pairs (x, y) of input images of different classes. One representation is class-relevant (z a, a for attribution) and the other is class-irrelevant (z c, c for content). The class-relevant representation is used for classification. Both representations are fed to a generator G to synthesize images so as to achieve domain translation.
ICAM: Interpretable Classification via Disentangled Representations and Feature Attribution Mapping
Feature attribution (FA), or the assignment of class-relevance to different locations in an image, is important for many classification problems but is particularly crucial within the neuroscience domain, where accurate mechanistic models of behaviours, or disease, require knowledge of all features discriminative of a trait. At the same time, predicting class relevance from brain images is challenging as phenotypes are typically heterogeneous, and changes occur against a background of significant natural variation. Here, we present a novel framework for creating class specific FA maps through image-to-image translation. We propose the use of a VAE-GAN to explicitly disentangle class relevance from background features for improved interpretability properties, which results in meaningful FA maps. We show that FA maps generated by our method outperform baseline FA methods when validated against ground truth. More significantly, our approach is the first to use latent space sampling to support exploration of phenotype variation.